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PROCESS AND SYSTEM FOR MERGING is produced for each such event. Pairs of records similar to 

TRACE DATA FOR PRIMARILY entry-exit records also are used to trace execution of arbi- 

INTERPRETED METHODS trary code segments, to record requesting and releasing 

locks, starting and completing I/O or data transmission, and 

CROSS REFERENCE TO RELATED 5 f° r man y otncr events of interest. 

APPLICATION Another tool used involves program sampling to identify 

certain locations in programs in which the programs appear 

This application is a continuation-in-part of the following to spend large amounts of time, such as program hot spots, 

copending and commonly assigned applications entitled This technique is based on the idea of interrupting the 

"SYSTEM AND METHOD FOR PROVIDING TRACE application or data processing system execution at regular 

INFORMATION REDUCTION", U.S. application Ser. No. intervals, so-called sample-based profiling. In order to 

08/989,725, filed on Dec. 12, 1997, now U.S. Pat. No. improve performance of code generated by various families 

6,055,492, "A METHOD AND APPARATUS FOR of computers, it is often necessary to determine where time 

STRUCTURED PROFILING OF DATA PROCESSING is being spent by the processor in executing code, such 

SYSTEMS AND APPLICATIONS", U.S. application Ser. efforts bein 8 commonly known in the computer processing 

No. 09/052,329, filed on Mar. 31, 1998, now U.S. Pat. No. arts M locating "hot spots." Ideally, one would like to isolate 

6,002,872, "METHOD AND APPARATUS FOR PROFIL- such not at thc instruction and/or source line of code 

ING PROCESSES IN A DATA PROCESSING SYSTEM", level " order t0 focus attentl0n OD areas which mi g hl benefit 

U.S. application Ser. No. 09/177,031, filed on Oct. 22, 1998, most frora improvement to the code. At each interruption, 

now U S. Pat. No. 6,311,325, and "METHOD AND SY^ m ta ""^ ° f f currenll y executm S thread ' a 

T c,i criD xjcDriw^ cucMTUAcrn hata ampi 20 process that is part of a larger process or program, is 

Iamp ?n ^A^^^^^^i?^^ rccordcd ' T yP ica11 * at P^-P'°«*sing time, these tools 
SAMPLED DATA INTO POSTPROCESSED TRACE c vah]es ^ m reso , ved iflSl a load and 

OUTPUT", U.S. application Ser. No. 09/343,438, currendy symbol tablc information for the data processing system, 
pending, filed Jun. 30, 1999. ^ a profile of wnere me time is being spent is obtained 

Additionally, this application is related to U.S. patent 25 from this analysis, 
application Ser. No. 09/052,331, filed Mar. 31, 1998,which For example, isolating such hot spots to the instruction 
issued as U.S. Pat. No. 6,158,024 and is hereby incorporated level permits compiler writers to find significant areas of 
by reference. suboptimal code generation, at which they may thus focus 

their efforts to improve code generation efficiency. Another 
BACKGROUND OF THE INVENTION 30 potential use of instruction level detail is to provide guid- 

1 Technical Field ancc *° c * es ^ Dcr °^ futon, systems. Such designers 

employ profiling tools to find characteristic code sequences 

The present invention relates to an improved data pro- and/of instructions that require optimization for the 

cessmg system and, in parttcular, to a method and apparatus avai i ab i e software for a given type of hardware, 

for optimizing performance in a data processing system. 35 Evenl4)ased profiliDg has Umitations . For example , event- 

SuU more particularly the present invention provides a basc fiH {& s[yc m tcrms of pcrformance (an 

method and apparatus for a software program development eyeQt md ^ %vbich ^ md often doe$ 

tool for enhancing performance of a software program perturi3 mc rcsulting ^ 0 f pcrformancc . Additionally, this 

through software profiling. technique is not always available because it requires the 

2, Description of Related Art ^ static or dynamic insertion of entry/exit events into the code. 

In analyzing and enhancing performance of a data pro- This insertion of events is sometimes not possible or is often 

cessing system and the applications executing within thc difficult. For example, if source code is unavailable for the 

data processing system, it is helpful to know which software to-be-instrument code, event-based profiling may not be 

modules within a data processing system are using system feasible. However, it is possible to instrument an interpreter 

resources. Effective management and enhancement of data 45 of the source code to obtain event-base profiling information 

processing systems requires knowing how and when various without changing the source code. 

system resources are being used. Performance tools are used On the other hand, sample-base profiling provides only a 

to monitor and examine a data processing system to deter- «fl a t view" of system performance but does provide the 

mine resource consumption as various software applications benefits of reduced cost and reduced dependence on 

are executing within the data processing system. For 50 hooking -capability. 

example, a performance tool may identify the most fre- Further, sample-based techniques do not identify where 

quently executed modules and instructions in a data pro- mc ^ mt ^ spcnt m many small md seemingly unrelated 

cessing system, or may identify those modules which alio- functions or in situations in which no clear hot spot is 

cate the largest amount of memory or perform the most I/O appar ent. Without an understanding of the program 

requests. Hardware performance tools may be built into the 55 structure, it is not clear with a "flat" profile how to determine 

system or added at a later point in time. Software perfor- where me performance improvements can be obtained, 

mance tools also are useful in data processing systems, such Therefore, it would be advantageous to provide both 

as personal computer systems, which typically do not con- event4)ased ^ samp le-based profiling of an application 

tain many, if any, built-in hardware performance tools. within ^ ^ ^ pcriod ft would bc particularly advan . 

One known software performance tool is a trace tool, 60 ^geous to provide the ability to enable and disable profiling 

which keeps track of particular sequences of instructions by 0 f selected portions of a data processing system and to 

logging certain events as they occur, so-called event-based combine the output from different types of profiling into a 

profiling. For example, a trace tool may log every entry into, smg i e merg ed presentation. 

and every exit from, a module, subroutine, method, function, _ w „ „^™ rt » T 

or system component. Alternately, a trace tool may log the 65 SUMMARY OF THE INVENTION 

requester and the amounts of memory allocated for each The present invention provides a process and system for 

memory allocation request. Typically, a time stamped record profiling code executing on a data processing system. Event - 
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based trace data is recorded in response to selected events, FIG. 18B depicts a particular timer based sampling of the 

and the event -based trace data includes an indication of execution flow depicted in FIG. 18A; 

which code is being interrupted. The trace data may be FIGS. 18C-D are time charts providing an example of the 

processed to identify a thread or method that was executing types of time for which the profiling tool accounts; 

during the event. A periodically occurring event is also s RG. 19 A is a diagram depicting a tree structure generated 

detected, and a stack associated with the profiled code is ^ om sampling a call stack; 

identified in response to detection of the periodically occur- nG J9B ^ a ^ depicting ^ event ^ which 

?^ C ^™^ reflects a* observed durm S W* tm execution; 

FIG. 20 is a table depicting a call stack tree; 
FIG. 21 is a flow chart depicting a method for building a 

BRIEF DESCRIPTION OF THE DRAWINGS call stack tree using a trace text file as input; 

„_ _, . ,. , . , ., f4L . RG. 22 is a flow chart depicting a method for building a 

The novel features believed characteristic of the invention n * i * j ■ n *_ • ■ *i- i j ■ 

" * . . j j , . ^. . 1C call stack tree dynamically as tracing is taking place during 

arc set forth in the appended claims. The invention itself, svstem execution* 

however, as well as a preferred mode of use, further obi ec- y ^ 

tives and advantages thereof, will best be understood by ™- 23 is a flowchart depicting a process for creating a 

reference to the following detailed description of an illus- cal1 stack ^ structurc ; 

trative embodiment when read in conjunction with the FIG. 24 is a flowchart depicting a process for identifying 

accompanying drawings, wherein: functions from an address obtained during sampling; 

FIG. 1 depicts a distributed data processing system in 20 FIG. 25 is a diagram depicting a structured profile 

which the present invention may be implemented; obtained using the processes of the present invention; 

FIGS. 2A-B are block diagrams depicting a data process- FIG. 26 is a diagram depicting a record generated using 

ing systems in which the present invention may be imple- ^ Processes of present invention; 

mented; 25 FIG ' ^7 is a diagram depicting another type of report that 

FIG. 3A is a block diagram depicting the relationship of mav . be P roduced * r sh ° w me caUin S structure between 

software components operating within a computer system ™u\iacs shown in FIG. 20, 

that may implement the present invention; mG 28 is a flowchart depicting the processing of a trace 

FIG. 3B is a block diagram depicting a Java virtual file ^ff* b ° th CVent and ^ m P le " based P rofil - 

machine in accordance with a preferred embodiment of the 30 m ^ 1 orma » 

present invention; FIG. 29 is a table depicting a report generated from a trace 

™„ . . , . \ ,. j • *• . j* file containing both event-based profiling information 

FIG. 4 is a block diagram depicUng components used to ^ ^ ^ s le . baS£ 5 ofi]ing infection 

profile processes in a data processing system; / » l, ay 
(stacK unwinosj; 

FIG. 5 is an illustration depicting various phases in 35 HG . 30 ^ a tabic dcpiclioB major codes and minor codes 

profihng the active processes m an operating system; ^ may fae employed t0 inslrument modules for profiling; 

FIG. 6 is a flowchart depicting a process used by a trace Ra 31A fe a flowchart depicting a process for inserting 

program for generating trace records from processes execut- filfi hQoU mfc) specific rmUines m reaMime by updatiag 

ing on a data processing system; lfae code fof a software interrupt; 

FIG. 7 is a flowchart depicting a process used in a system 40 nQS 31B-C m examples of pseu do-assembly language 

interrupt handler trace hook; code lhat depict me changes required for inserting profile 

FIG. 8 is a flowchart depicting a process used to generate hooks into specific routines in real-time by updating the 

trace data during the initialization phase and each time a co( j c for a software interrupt; 

class is loaded by the JVM; p IGS 32A-32C depict a series of tree structures gener- 

FIG. 9 is a flowchart depicting a process used by a trace 4 ated from events and stack unwinds; and 
hook that reports when a class is unloaded; FIG. 33 is a flowchart depicting a method by which a tree 
FIG. 10A is a flowchart depicting a process used when a containing merged sample data and event data is con- 
method is loaded; structed. 

FIG 10B is a flowchart depicting a process used when a 50 DETAILED DESCRIPTION OF A PREFERRED 

method is unloaded; EMBODIMENT 

FIG. 11 is a flowchart depicting a process used by a trace 

hook that reports when a thread is initialized; Wltts reference now to the figures, and in particular with 

FIG. 12 is a flowchart depicting a process for processing "= ferCQCe «° FIG. 1, a pictorial representation of a distributed 

trace records- 5S data processing system m which the present invention may 



FIG. 13 Is a diagram depicting a hash table; 



be implemented is depicted. 

. c . , Distributed data processing system 100 is a network of 

FIG 14 is a flowchart depicting a process for class and computeis ffl which me present mvention may 5e imple . 

method processing; mented. Distributed data processing system 100 contains a 

FIG. IS is a flowchart depicting a process for thread 6Q ne twork 102, which is the medium used to provide commu- 

record processing; nications links between various devices and computers 

FIG. 16 is a diagram depicting the call stack containing connected together within distributed data processing sys- 

stack frames; tem 100. Network 102 may include permanent connections, 

FIG. 17 is an illustration depicting a call stack sample; such as wire or fiber optic cables, or temporary connections 

FIG. 18A is a diagram depicting a program execution 65 made through telephone connections. 

sequence along with the state of the call stack at each In the depicted example, a server 104 is connected to 

function entry/exit point; network 102 along with storage unit 106. In addition, clients 
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108, 110, and 112 also are connected to a network 102. implemented is illustrated. Data processing system 250 is an 
These clients 108, 110, and 112 may be, for example, example of a client computer. Data processing system 250 
personal computers or network computers. For purposes of employs a peripheral component interconnect (PCI) local 
this application, a network computer is any computer, bus architecture. Although the depicted example employs a 
coupled to a network, which receives a program or other 5 PCI bus, other bus architectures such as Micro Channel and 
application from another computer coupled to the network. ISA may be used. Processor 252 and main memory 254 are 
In the depicted example, server 104 provides data, such as connected to PCI local bus 256 through PCI Bridge 258. PCI 
boot files, operating system images, and applications to Bridge 258 also may include an integrated memory control- 
clients 108-112. Clients 108, 110, and 112 are clients to ler and cache memory for processor 252. Additional con- 
server 104. Distributed data processing system 100 may nections to PCI local bus 256 may be made through direct 
include additional servers, clients, and other devices not component interconnection or through add-in boards. In the 
shown. In the depicted example, distributed data processing depicted example, local area network (LAN) adapter 260, 
system 100 is the Internet with network 102 representing a SCSI host bus adapter 262, and expansion bus interface 264 
worldwide collection of networks and gateways that use the are connected to PCI local bus 256 by direct component 
TCP/IP suite of protocols to communicate with one another. connection. In contrast, audio adapter 266, graphics adapter 
At the heart of the Internet is a backbone of high-speed data 15 2 68, and audio/video adapter (A/V) 269 are connected to 
communication lines between major nodes or host PCI local bus 266 by add-in boards inserted into expansion 
computers, consisting of thousands of commercial, slots. Expansion bus interface 264 provides a connection for 
government, educational, and other computer systems, that a keyboard and mouse adapter 270, modem 272, and addi- 
route data and messages. Of course, distributed data pro- ^ona! memory 274. SCSI host bus adapter 262 provides a 
cessing system 100 also may be implemented as a number 20 con ncction for hard disk drive 276, tape drive 278, and 
of different types of networks, such as, for example, an CD-ROM 280 in the depicted example. Typical PCI local 
Intranet or a local area network. 5us implementations will support three or four PCI expan- 

FIG. 1 is intended as an example, and not as an architec- s i on slots or add- in connectors, 

rural limitation for the processes of the present invention. ^ operating system runs on processor 252 and is used to 

With reference now to FIG. 2A, a block diagram of a data coordinate and provide control of various components 

processing system which may be implemented as a server, within data processing system 250 in FIG. 2B. The operating 

such as server 104 in FIG. 1, is depicted in accordance to the system may be a commercially available operating system 

present invention. Data processing system 200 may be a SU ch as JavaOS For Business™ or OS/2™, which are 

symmetric multiprocessor (SMP) system including a plural- 3Q available from International Business Machines Corpora- 

ity of processors 202 and 204 connected to system bus 206. tion™. JavaOS is loaded from a server on a network to a 

Alternatively, a single processor system may be employed. network client and supports Java programs and applets. A 

Also connected to system bus 206 is memory controller/ couple of characteristics of JavaOS that are favorable for 

cache 208, which provides an interface to local memory 209. performing traces with stack unwinds, as described below, 

I/O Bus Bridge 210 is connected to system bus 206 and ^ are ma t JavaOS does not support paging or virtual memory, 

provides an interface to I/O bus 212. Memory controller/ An object oriented programming system such as Java may 

cache 208 and I/O Bus Bridge 210 may be integrated as run in conjunction with the operating system and provides 

depicted. calls to the operating system from Java programs or appli- 

Peripheral component interconnect (PCI) bus bridge 214 cations executing on data processing system 250. Instruc- 

connected to I/O bus 212 provides an interface to PCI local 4Q tions for the operating system, the object-oriented operating 

bus 216. A modem 218 may be connected to PCI local bus system, and applications or programs are located on storage 

216. Typical PCI bus implementations will support four PCI devices, such as hard disk drive 276 and may be loaded into 

expansion slots or add-in connectors. Communications links main memory 254 for execution by processor 252. Often 

to network computers 108-112 in FIG. 1 may be provided times, hard disk drives are absent and memory is constrained 

through modem 218 and network adapter 220 connected to 45 when data processing system 250 is used as a network client. 

PQ local bus 216 through add-in boards. Those of ordinary skill in the art will appreciate that the 

Additional PCI bus bridges 222 and 224 provide inter- hardware in FIG. 2B may vary depending on the im piemen- 
faces for additional PCI buses 226 and 228, from which tation. For example, other peripheral devices, such as optical 
additional modems or network adapters may be supported. disk drives and the like may be used in addition to or in place 
In this manner, server 200 allows connections to multiple 50 of the hardware depicted in FIG. 2B. The depicted example 
network computers. A memory mapped graphics adapter is not meant to imply architectural limitations with respect 
230 and hard disk 232 may also be connected to I/O bus 212 10 the present invention. For example, the processes of the 
as depicted, either directly or indirectly. present invention may be applied to a multiprocessor data 

Those of ordinary skill in the art will appreciate that the processing system, 

hardware depicted in FIG. 2 A may vary. For example, other 55 The present invention provides a process and system for 

peripheral devices, such as optical disk drive and the like profiling software applications. Although the present inven- 

also may be used in addition or in place of the hardware tion may operate on a variety of computer platforms and 

depicted. The depicted example is not meant to imply operating systems, it may also operate within a Java runtime 

architectural limitations with respect to the present inven- environment. Hence, the present invention may operate in 

tion. 60 conjunction with a Java virtual machine (JVM) yet within 

The data processing system depicted in FIG. 2A may be, the boundaries of a JVM as defined by Java standard 

for example, an IBM RISC/System 6000 system, a product specifications. In order to provide a context for the present 

of International Business Machines Corporation in Armonk, invention, portions of the operation of a JVM according to 

New York, running the Advanced Interactive Executive Java specifications are herein described. 

(AIX) operating system. 65 With reference now to FIG. 3 A, a block diagram illus- 

With reference now to FIG. 2B, a block diagram of a data trates the relationship of software components operating 

processing system in which the present invention may be within a computer system that may implement the present 
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invention. Java-based system 300 contains platform specific hardware and embedded on a chip so that the Java bytecodes 

operating system 302 that provides hardware and system are executed natively. JVMs usually interpret bytecodes, but 

support to software executing on a specific hardware plat- JVMs may also use other techniques, such as just-in-time 

form. JVM 304 is one software application that may execute compiling, to execute bytecodes. 

in conjunction with the operating system. JVM 304 provides 5 Interpreting code provides an additional benefit Rather 
a Java run-time environment with the ability to execute Java ihan instrumenting the Java source code, the interpreter may 
application or applet 306, which is a program, servlet, or be instrumented. Trace data may be generated via selected 
software component written in the Java programming lan- events and timers through the instrumented interpreter with- 
guage. The computer system in which JVM 304 operates ou t modifying the source code. Profile instrumentation is 
may be similar to data processing system 200 or computer 10 discussed in more detail further below. 
100 described above. However, JVM 304 may be imple- when an application is executed on a JVM that is imple- 
mented in dedicated hardware on a so-called Java chip, mcn ted in software on a platform-specific operating system, 
Java-on-silicon, or Java processor with an embedded pico- a Java application may interact with the host operating 
Java core. system by invoking native methods. A Java method is 

At the center of a Java run-time environment is the JVM, 15 written in the Java language, compiled to bytecodes, and 

which supports all aspects of Java's environment, including stored in class files. A native method is written in some other 

its architecture, security features, mobility across networks, language and compiled to the native machine code of a 

and platform independence. particular processor. Native methods are stored in a dynami- 

The JVM is a virtual computer, i.e. a computer that is cally linked library whose exact form is platform specific, 

specified abstractly. The specification defines certain fea- 20 With reference now to FIG. 3B, a block diagram of a JVM 

tures that every JVM must implement, with some range of is depicted in accordance with a preferred embodiment of 

design choices that may depend upon the platform on which the present invention. JVM 350 includes a class loader 

the JVM is designed to execute. For example, all JVMs must subsystem 352, which is a mechanism for loading types, 

execute Java bytecodes and may use a range of techniques such as classes and interfaces, given fully qualified names, 

to execute the instructions represented by the bytecodes. A 25 JVM 350 also contains runtime data areas 354, execution 

JVM may be implemented completely in software or some- engine 356, native method interface 358, and memory 

what in hardware. This flexibility allows different JVMs to management 374. Execution engine 356 is a mechanism for 

be designed for mainframe computers and PDAs. executing instructions contained in the methods of classes 

The JVM is the name of a virtual computer component 3Q loaded by class loader subsystem 352. Execution engine 356 

that actually executes Java programs. Java programs are not may be, for example, Java interpreter 362 or just-in-timc 

run directly by the central processor but instead by the JVM, compiler 360. Native method interface 358 allows access to 

which is itself a piece of software running on the processor. resources in the underlying operating system. Native method 

The JVM allows Java programs to be executed on a different interface 358 may be, for example, a Java native interface, 

platform as opposed to only the one platform for which the 3S Runtime data areas 354 contain native method stacks 364, 

code was compiled. Java programs are compiled for the Java stacks 366, PC registers 368, method area 370, and 

JVM. In this manner, Java is able to support applications for heap 372. These different data areas represent the organiza- 

many types of data processing systems, which may contain tion of memory needed by JVM 350 to execute a program, 

a variety of central processing units and operating systems Java stacks 366 are used to store the state of Java method 

architectures. To enable a Java application to execute on 4Q invocations. When a new thread is launched, the JVM 

different types of data processing systems, a compiler typi- creates a new Java stack for the thread. The JVM performs 

cally generates an architecture-neutral file format — the com- only two operations directly on Java stacks: it pushes and 

piled code is executable on many processors, given the p 0 ps frames. A thread's Java stack stores the state of Java 

presence of the Java run-time system. The Java compiler method invocations for the thread. The state of a Java 

generates bytecode instructions that are nonspecific to a 45 method invocation includes its local variables, the param- 

particular computer architecture. A bytecode is a machine e ters with which it was invoked, its return value, if any, and 

independent code generated by the Java compiler and intermediate calculations. Java stacks are composed of stack 

executed by a Java interpreter. A Java interpreter is part of frames. A stack frame contains the state of a single Java 

the JVM that alternately decodes and interprets a bytecode method invocation. When a thread invokes a method, the 

or bytecodes. These bytecode instructions are designed to be 5Q j VM pushes a new frame onto the Java stack of the thread, 

easy to interpret on any computer and easily translated on When the method completes, the JVM pops the frame for 

the fly into native machine code. Byte codes are may be that method and discards it. The JVM does not have any 

translated into native code by a just-in-time compiler or JIT registers for holding intermediate values; any Java instruc- 

A JVM must load class files and execute the bytecodes tion that requires or produces an intermediate value uses the 

within them. The JVM contains a class loader, which loads 5S stack for holding the intermediate values. In this manner, the 

class files from an application and the class files from the Java instruction set is well-defined for a variety of platform 

Java application programming interfaces (APIs) which are architectures. 

needed by the application. The execution engine that PC registers 368 are used to indicate the next instruction 

executes the bytecodes may vary across platforms and to be executed. Each instantiated thread gels its own pc 

implementations. 60 register (program counter) and Java stack. If the thread is 

One type of software -based execution engine is a just-in- executing a JVM method, the value of the pc register 

time compiler. With this type of execution, the bytecodes of indicates the next instruction to execute. If the thread is 

a method or determination are compiled to native machine executing a native method, then the contents of the pc 

code upon successful fulfillment of some type of criteria for register are undefined. 

jitting a method. The native machine code for the method is 65 Native method stacks 364 store the state of invocations of 

then cached and reused upon the next invocation of the native methods. The state of native method invocations is 

method. The execution engine may also be implemented in stored in an implementation-dependent way in native 
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method stacks, registers, or other implementation-dependent existing threads, all loaded classes, and all methods for the 
memory areas. In some JVM implementations, native loaded classes. Records from trace data captured from hooks 
method stacks 364 and Java stacks 366 are combined. are written to indicate thread switches, interrupts, and load- 
Method area 370 contains class data while heap 372 ^fding of d«scs and jilted methods Any class 
contains all instantiated objects. Hie JVM specification 5 whicb is loufed has ferm rcooiA thai indicate the nan» of 
.... - » » w . Ti#w me class and its methods. In the depiaed example, four byte 
strictly defines data types and operations. Most JVMs T1% , . , c \ . r j \ 

* , j j • . c i_ IDs are used as identifiers for threads, classes, and methods, 

choose to have one mettiod area and one heap each of which ^ Ws m ^ names out t m me rccords , 

are shared by aU threads running inside the JVM. When the A fe wriUen to mdicate when ^ of ^ ^ 

JVM loads a class file, it parses information about a type information has been written. 

from the binary data contained in the class file It places this 10 ^ d ^ ^ ^ recofds are 

type information mto the method area. Each time a class wriUcn to a ^ filc Xracc rccords may originatc from two 

instance or array is created, the memory for the new object types of pro fiii Dg actions— event-based profiling and 

is allocated from heap 372. JVM 350 includes an instruction sample-based profiling. In the present invention, the trace 

that allocates memory space within the memory for heap fii e may have a combination of event-based records, such as 

372 but includes no instruction for freeing that space within 15 those that may originate from a trace hook executed in 

the memory. Memory management 374 in the depicted response to a particular type of event, e.g., a method entry 

example manages memory space within the memory alio- or method exit, and sample-based records, such as those that 

cated to heap 370. Memory management 374 may include a may originate from a stack walking function executed in 

garbage collector which automatically reclaims memory response to a timer interrupt, e.g., a stack unwind record, 

used by objects that are no longer referenced. Additionally, 20 also called a call stack record. 

a garbage collector also may move objects to reduce heap For example, the following process may occur during the 

fragmentation. profiling phase if the user of the profiling utility has 

The present invention provides both event-based profiling jested sample-based profiling information. Each time a 

and sample-based profiling of an appUcation within the same ^ T>™P a ^ f 00 *! 8 

• j j «i i * j . c tU . , -n, 25 written, which indicates the system program counter. This 

time period, as described m more deUil father below. Hie s m ' £ ^Jio identify the routine 

processes within the figures may be categorized in an ^ fa ^ tcd In mc d ' ictcd { a ^interrupt 

attempt to gain an overall perspective of the many processes fe ^ ^ athering of Uace data . Q f course, other 

employed within the present invention: processes that gen- (ypes of mtermpts may be used other ^ ^ mcT interrupts, 

erate event-based profiling information in the form of spe- Interrupts based on a programmed performance monitor 

cific types of records in a trace file; processes that generate event or other types of periodic events may be employed, 

sample-based profiling information in the form of specific fa ^ pos t- P rocessing phase 504, the data collected in the 

types of records in a trace file; processes that read the trace buffcf [s SCQt tQ a fi]c for postprocessing. In one 

records to generate more useful information to be placed configuration, the file may be sent to a server, which 

into profile reports; and processes that generate the profile dctcrmines thc profilc for lhc proccsscs 0 n the client 

reports for the user of the profiling utility. machine. Of course, depending on available resources, the 

With reference now to FIG. 4, a block diagram depicts post-processing also may be performed on the client 

components used to profile processes in a data processing machine. In post-processing phase 504, B-trees and/or hash 

system. A trace program 400, also referred to as Java time tables are employed to maintain names associated with Ids 

profiler, is used to profile processes 402. Trace program 400 ^ as me reC ords in the trace file are processed. A hash table 

may be used to record data upon the execution of a hook, employs hashing to convert an identifier or a key, meaning- 

which is a specialized piece of code at a specific location in fu] to a user, into a value for the location of the correspond - 

a routine or program in which other routines may be mg data m me table. While processing trace records, the 

connected. Trace hooks are typically inserted for the purpose B-trees and/or hash tables are updated to reflect the current 

of debugging, performance analysis, or enhancing function- 45 slat e 0 f the client machine, including newly loaded jitted 

ality. These trace hooks are employed to send trace data to co de or unloaded code. Also, in the post-processing phase 

trace program 400, which stores the trace data in buffer 404. 504^ each trace record is processed in a serial manner. As 

The trace data in buffer 404 may be stored in a file for soon as the indicator is encountered that all of the start up 

post-processing. With Java operating systems, the present information has been processed, event-based trace records 

invention employs trace hooks that aid in identifying inter- 5Q fr om trace hooks and sample-based trace records from timer 

preted methods that may be used in processes 402. Id interrupts are then processed. Timer interrupt information 

addition, since classes may be loaded and unloaded, these from the timer interrupt records are resolved with existing 

changes also are identified using trace data in accordance hash tables. In addition, this information identifies the thread 

with a preferred embodiment of the present invention. This an d function being executed. The data is stored in hash 

is especially relevant with "network client** data processing 5S ( a b]es with a count identifying the number of timer tick 

systems, such as those that may operate under JavaOS, since occurrences associated with each way of looking at the data, 

classes and jitted methods may be loaded and unloaded more After all of the trace records are processed, the information 

frequently due to the constrained memory and role as a is formatted for output in the form of a report, 

network client. Alternatively, trace information may be processed on-the- 

With reference now to FIG. 5, a diagram depicts various 60 fly so that trace data structures are maintained during the 

phases in profiling the proccsscs active in an operating profiling phase. In other words, while a profiling function, 

system. Subject to memory constraints, the generated trace such as a timer interrupt, is executing, rather than (or in 

output may be as long and as detailed as the analyst requires addition to) writing trace records to a file, the trace record 

for the purpose of profiling a particular program. information is processed to construct and maintain any 

An initialization phase 500 is used to capture the state of 65 appropriate data structures, 

the client machine at the time tracing is initiated. This trace For example, during the processing of a timer interrupt 

initialization data includes trace records that identify all during the profiling phase, a determination could be made as 
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to whether the code being interrupted is being interpreted by process is employed during the initialization phase for each 

the Java interpreter. If the code being interrupted is class that is loaded. In addition, the steps in the flowchart 

interpreted, the method ID of the method being interpreted shown in FIG. 8 are also used each time a class is loaded 

may be placed in the trace record. In addition, the name of during profiling or tracing of processes, 

the method may be obtained and placed in the appropriate 5 jh c process begins by obtaining the identification of the 

B-tree. Once the profiling phase has completed, the data the name of the class (step 800). Then, the number 

structures may contain all the information necessary for of methods associated with the class are obtained (step 802) 

generating a profile report without the need for postproccss- and a counter is initialized to this value. A class trace record 

ing of the trace file. is then written using the information (step 803). This infor- 

With reference now to FIG. 6, a flowchart depicts a 10 mation includes a trace record indicating the class block 

process used by a trace program for generating trace records address and number of methods. A counter is set equal to the 

from processes executing on a data processing system. FIG. number of methods (step 804). Next, a determination is 

6 provides further detail concerning the generation of trace made as to whether the counter is equal to zero (step 806). 

records that were not described with respect to FIG. 5. If the counter is equal to zero, the process terminates 

Trace records are produced by the execution of small 15 thereafter. If the counter is non-zero, then a trace record is 

pieces of code called "hooks". Hooks may be inserted in written for each method as follows. A method block address 

various ways into the code executed by processes, including is identified for the next method (step 808). A flag is obtained 

statistically (source code) and dynamically (through modi- that indicates whether the method is jitted (step 810). This 

fication of a loaded executable). This process is employed flag may be obtained from a table used in a JVM. A method 

after trace hooks have already been inserted into the process 20 has been "jitted" when the bytecodes for the method have 

or processes of interest. The process begins by allocating a been compiled into native machine language instructions for 

buffer (step 600), such as buffer 404 in FIG. 4. Next, in the use on the client data processing system on which the 

depicted example, trace hooks are turned on (step 602), and method is to be executed. The address at which the jitted 

tracing of the processes on the system begins (step 604). code is located is the compiled address. 

Trace data is received from the processes of interest (step 25 Then, a determination is made as to whether a compiled 

606). This type of tracing may be performed during phases address is present (step 812). This determination is made by 

500 and/or 502. This trace data is stored as trace records in examining the flag obtained in (step 810). If a compiled 

the buffer (step 608). A determination is made as to whether address is present, the compiled address is retrieved from the 

10 tracing has finished (step 610). Tracing finishes when the JIT's table, which is an internal control block used to 

trace buffer has been filled or the user stops tracing via a 30 manage the space allocated to jitted code (step 814). A name 

command and requests that the buffer contents be sent to file. is then obtained for the method (step 816). If the process 

If tracing has not finished, the process returns to step 602 as proceeds directly to this step from step 812, then a compiled 

described above. address is not present. A trace record indicating the method 

Otherwise, when tracing is finished, the buffer contents information, such as the method block address, flags, jitted 

are sent to a file for post-processing (step 612). A report is 35 address, and method name, is written (step 817). 

then generated in post-processing (step 614) with the pro- Thereafter, the counter is decremented (step 818), and a 

cess terminating thereafter. determination is made as to whether the counter is equal to 

Although the depicted example uses post-processing to zero (step 820). If the counter is equal to zero, the process 

analyze the trace records, the processes of the present terminates. The trace records are startup records if the 

invention may be used to process trace information in process was used during the initialization phase. If the 

real-time depending on the implementation. process is employed during the loading of a class, the record 

With reference now to FIG. 7, a flowchart depicts a is a load class and/or method records. The process will then 

process that may be used during an interrupt handler trace terminate because all of the methods associated with the 

noo k 45 class will have been processed. On the other hand, if the 

The process begins by obtaining a program counter (step counter is not equal to zero, the process will return to step 

700). Typically, the program counter is available in one of **06. 

the saved program stack areas. Thereafter, a determination is With reference now to FIG. 9, a flowchart depicts a 

made as to whether the code being interrupted is interpreted process used by a trace hook to generate a trace record that 

code (step 702). This determination may be made by deter- 50 records when a class is unloaded. This process is employed 

mining whether the program counter is within an address each time a class is unloaded. The process begins by 

range for the interpreter used to interpret bytecodes. If the identifying the class block address for the class that is 

code being interrupted is interpreted, a method block unloaded (step 900). Then, a trace record is written (step 

address is obtained for the code being interpreted. A trace 902) with the process terminating thereafter. The generated 

record is then written (step 706). The trace record is written 55 record is a class unload record. 

by sending the trace information to a trace program, such as With reference now to FIG. 10 A, a flowchart depicts a 

trace program 400, which generates trace records for post- process used by a trace hook to generate a trace record when 

processing in the depicted example. This trace record is a method is jitted. The process identifies the method being 

referred to as an interrupt record, or an interrupt hook. jitted (step 1010) and writes a trace record identifying the 

This type of trace may be performed during phase 502. 60 method block address and the compiled address of the 

Alternatively, a similar process, i.e. determining whether method being jitted (step 1020), after which the process 

code that was interrupted is interpreted code, may occur terminates. 

during postprocessing of a trace file, as is shown below with With reference now to FIG. 10B, a flowchart depicts a 

respect to FIG. 15. process used by a trace hook to generate a trace record when 

With reference now to FIG. 8, a flowchart illustrates a 65 a jitted method is unloaded. The process identifies the jitted 

process used to generate trace data during the initialization method being unloaded (step 1000) and writes a trace record 

phase and each time a class is loaded by the JVM. This (step 1002) with the process terminating thereafter. Either 
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the method block address or the compiled address may be With reference now to FIG. 14, a flowchart depicts a 

used to identify the method. process for class trace record and method trace record 

With reference now to FIG. 11, a flowchart depicts a postprocessing. This figure is a more detailed diagram of 

process used by a trace hook that records when a thread is slc P I 208 in FIG. 12. The process begins by obtaining the 

initialized. The process begins by obtaining a thread ID for 5 class block address from the class record (step 1400). Then, 

the thread being initialized (step U00). The name of the * method count is obtained, and the counter is set equal to 

thread is obtained (step 1102), and a trace record is then me method count (step 1402), and the name of the class is 

written (step 1104) with the process terminating thereafter. obtained (step 1404). The information in steps 1402 and 

™f-«™ „™ i« err- n , flmun L, H , • to „ 1404 are obtained from the trace records. A class method 

With reference now to FIG. 12, a flowchart depicts a . , -r LJ * . 

process for processing trace records during the postprocess- 10 h«h table is upda^ if new metiods are present when the 

ing phase. TT»e process begins by determining whether a method count 15 ™ sl ? 1402 <f P Meth ,« i 

valid trace file is present (step 1200). If a valid trace file is nam « « cc 1 ^ ct ' d / rom ! he class ' n ' he 

not present, the process terminates. Otherwise, a determi- r f cord 1408 > , A determination is made as to whether 

nation is made as to whether a valid symbol file is present *** °° unler 15 e 1 ? ual to ™ m < sl ?P 1410 >- If co " nler 15 

(step 1202). A symbol file contains symbolic information 15 * zero >. the process termmales. Otherwise, the next 

... ft. „••„., „ „„ a - u„i bi„ ' l „„„„„. „j . „ method is retrieved from the trace record, and the counter is 

such as function names. A symbol rile may be generated by , , , „ .„_. . . , , ... 

running the Unix command "nm". The process also termi- decremented (step 1412). Information for each > nrthod is 

nates if a valid symbol file is not present. If the symbol file se P arate trace record. A method block address is 

is valid, then the symbols are loaded into a B-tree ordered by ^'ained for the method in the method record (step 1414). 

addresses (step 1203), and the trace file is read to obtain 20 1"'°™ ion about the method isplaced in a method values 

,„ \ I ' „■„„ nnA\ bash table with the method block address used as a key with 

trace records tor processing (step 12U4). , ., ,. „, , 

.. . , ,, , ,. a string consisting of the ClassName.MethodName plus a 

A determination is made as to whether the trace record is sj e ^ ^ ^ object Qr ^ ^ ^ hash uble (s , 

a startup record or a load class record (step 1206 it the 141fi) _ a fl ^ obujned from , he record 

to determine 

record is a startup record or a toad class record class and u ^ me(hod faas been ( 1418). A determi- 

method processing k employed (step 1208) with the process nation fc made ^ (o whether ^ method ^ ^ ( M20) 

then determining .whether additional records are present for Jf ^ method ^j;,^ the aed address ^ o5uined fronl 

processing (step 1210). Step 1208 is described in more detail ^ ^ recQrd ( U22) ^ m fa created m , he 

m RG. 14 below. If more records are present, the process vectof ub , e ^ ^ method Wock address M ^ k an(J 

obtains the next record for processing (step 1212) and M ^ ^aed address as the object or data referenced by the 

returns to step 1206. Otherwise, a report is generated (step k ( U24) ^ ^ K teminitti thereafter ^ 

1214) with the process termmaUng thereafter. pro(xss ako terminales &om step 1420 if me melhod ^ not 

With reference again to step 1206, if the trace record is not jitted. 

a class startup record or a load class record, the process „ r .. A . 1C n fl ^„ f . „ . . „ 

. . £ . A . , , . . • • j ,i j With reference now to FIG. 15, a flowchart depicts a 

determines whether the trace record is a load jitted method <■ ** i ,j • 1 e . 

, , tf . . . , , 33 process for tuner tickrecord postprocessmg. FIG. 15 is a 

record (step 1216). If so, then the process adds the method v , 4 , , ... f F . *^ inan . CTr , t. 

, , v . iL Ti * J - , o\ rr 4i_ * rnore detailed desenphon of step 1230 m FIG. 12. The 

address and name to the B-tree (step 1218). It the trace . . . ; . . t , . 

..... i. J Jt . j process begms by determining whether the thread is on an 

record is not a load jitted method record, then a determma- f . , , & . , ; lcnft x l€t Z *u ~ a * # ^ • f 

, J . . * . , , interrupt level (step 1500). If the thread is not on an interrupt 

tion is made as to whether the trace record is an unload jitted , , T ^ _ o „. (n . ;,. a # . , „ nmo ! ff t /\ 

. . lf iL level, the process obtains the thread name (step 1502). It the 

method record (step 1220). If so, then the process removes ^ 4 . \. . . tl , t . j • -j j! «u 

i_ ,t j j j j r *Lr»: /* 40 thread is on an interrupt level, the thread is identified as the 

the method address and name from the B-tree (step 1222). . . . . , , . i -.u 

. , . , . ■• . . . . . . current interrupt level (step 1504). In either case, a deter- 
mine trace record is not an unload jitted method record, then . ■ tU * mn * a oe , trt . ' , nr „ rtt tUo nr ^ ram 
tt _ , t . . J t . . * i mmation is then made as to whether or not the program 

the process determines whether the trace record is an unload „ t _ . ... . ; „ tmntar /eton i « n /:\ 

. r , , 4 ^^-..ix Tf.L . j- i j i counter is within the interpreter (step 1506). 
class record (step 1224). If the trace record is an unload class 

record, the process then removes entries for methods asso- 45 l If J [ he program counter is not within the interpreter, then 

ciated with the class in a class methods bash table and vector we B-tree is used to lookup the address at or preceedingly 

(step 1226) with the process then proceeding to step 1210 as closesl to lhe P ro e ram coumer the B " tree 10 S el lhe name 

described above. A vector is similar to a hash table except of thc mclhod ^ was interrupted in order to generate the 

that a vector identifies an ordered relationship between timer tick record ( ste P 1508 )- nc P rooess then conUnues 

vector elements. 50 ^ ste P 1518 * 

Referring again to step 1224, if the trace record is not an With reference again to step 1506, if the program counter 

unload class record, a determination is then made as to is witmn the mterpreter, the process then obtains the method 

whether the trace record is a timer tick record (step 1228). block address from the record (step 1514). Thereafter, the 

If so, then the timer tick record is processed (step 1230) with method name is obtained from the method values hash table 

the process then returning to step 1210. Step 1230 is 55 ( slep 151< ^" 

described in more detail in FIG. 15 below. If no more Thereafter, a determination is made as to whether the 

records are present, a report is generated (step 1214), and the method and/or thread/method, also referred to as an object, 

process terminates. is present in the count hash Uble (step 1518). If the object 

With reference now to FIG. 13, a diagram depicts a hash ^ present, the count for the object is incremented with the 

table. Hash table 1300 includes data 1302, which is accessed 60 process terminating thereafter (step 1520). Otherwise, the 

using key 1304. Key 1304 is converted using hashing into a object is added to the count hash table (step 1522), and the 

value for the location of data 1302 within hash table 1300. <*>unt for the object is set equal to one (step 1524) with the 

Hash table 1300 is an example of a hash table that may be process terminating thereafter. The object may be, for 

used to implement the class methods hash table in step 1218 example, a method or thread concatenated with the method 

in FIG. 12. A vector is similar to a hash table except that a 65 name. 

vector identifies an ordered relationship between vector As noted previously, some of the figures describe a set of 

elements. processes that may be employed to obtain event-based 
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profiling information. As applications execute, profiling stack represent the state of execution of that thread at any 

information in the form of trace records may be written to a time. Since stack frames are typically interlinked (e.g., each 

buffer or file. The trace records may then be post-processed. stack frame points to the previous stack frame), it is often 

Id addition to event-based profiling, a set of processes P"*™* '° ? c l bick "P ** of s,ack f ^ es and 

may be employed to obtain sample-based profiling informa- s devcl ?P st4ck ■ *•* a " 

tion As applications execute, the applications may be peri- ?° m P letfid func , Uon caUs - m ° lhei 1 . reflects ^ 

.... -V 4 , . , ' K r . c \. u function invocation sequence at any point in time, 

odically interrupted in order to obtain information about the ^ „ 4 . + £tin . . , . f . 

/ r . . . c . Call stack 1600 includes information identifying the rou- 

current runtime environment. This information may be wnt- ,, , . . . } . s , . . 

4 A , a ~* r , iL . i tine that is currently running, the routine that invoked it, and 

ten to a buffer or file for postprocessing, or the information „ m . .u L ■ n . i * ^nn 
. , . J, . \ , , V^V nn so on all the way up to the main program. Call stack 1600 

may be processed on- the-fly into data structures representing 10 . , , / r c . . - r A£tyA ~,~c j 

} \ . . i* . . includes a number of stack frames 1602, 1604, 1606, and 

an ongoing history of the runtime environment HGS. 16 1608 . t d the depicted exam pie, stack frame 1602 is at the top 

and 17 desenbe sample-based profiling m more detail. q{ ^ ^ ^ ^ ^ ^ im fa ^ ^ 

A sample-based profiler obtains mformation from the bottom of call stack 1600. The top of the call stack is also 
stack of an interrupted thread. The thread is interrupted by referred to as the "root". The timer interrupt (found in most 
a timer interrupt presently available in many operating operating systems) is modified to obtain the program counter 
systems. Hie user of the trace facility selects either the value ^ of ^ mtcrruptcd mrcadf together with the 
program counter option or the stack unwind option, which pointer to the currently active stack frame for that thread. In 
may be accomplished by enabling one major code or another me Imel arcm tecture, this is typically represented by the 
major code, as described further below. This timer interrupt contents of registers: EIP (program counter) and EBP 
is employed to sample information from a call stack. By (pointer to stack frame). By accessing the currently active 
walking back up the call stack, a complete call stack can be stack frame> it ^ possible to take advantage of the (typical) 
obtained for analysis. A "stack walk" may also be described stack linkage convention in order to chain all of the 
as a "stack unwind", and the process of "walking the stack" frames together. Part of the standard linkage convention also 
may also be described as "unwinding the stack/' Each of dictates that the function return address be placed just above 
these terms illustrates a different metaphor for the process. mc invoked-function's stack frame; this can be used to 
The process can be described as "walking" as the process ascertain the address for the invoked function. While this 
must obtain and process the stack frames step-by-step. The discussion employs an Intel-based architecture, this example 
process may also be described as "unwinding" as the process ^ not a restriction. Most architectures employ linkage con- 
must obtain and process the stack frames that point to one ventions that can be similarly navigated by a modified 
another, and these pointers and their information must be profiling interrupt handler 

"unwound" through many pointer dereferences. a Umer imerrupt occurSj lhe first param eter 

The stack unwind records the sequence of functions/ acquired is the program counter value. The next value is the 

method calls at the time of the interrupt. A call stack is an pointer to the top of the current stack frame for the inter- 
ordered list of routines plus offsets within routines (ix. 35 rupted thread. In the depicted example, this value would 

modules, functions, methods, etc.) that have been entered po j n t to EBP 1608a in stack frame 1608. In turn, EBP 1608 

during execution of a program. For example, if routine A points to EBP 1606a in stack frame 1606, which in turn 

calls routine B, and then routine B calls routine C, while the po^ t0 EBP 1604a in stack frame 1604. In turn, this EBP 

processor is executing instructions in routine C, the call po ints to EBP 1602a in stack frame 1602. Within stack 
stack is ABC. When control returns from routine C back to ^ fr^es 1602-1608 are EIPs 1602fr-1608i>, which identify 

routine B, the call stack is AB. For more compact presen- the calling routine's return address. The routines may be 

tation and ease of interpretation within a generated report, identified from these addresses. Thus, routines are defined 

the names of the routines are presented without any infor- by collecting all of the return addresses by walking up or 

mation about offsets. Offsets could be used for more detailed backwards through the stack. 

analysis of the execution of a program, however, offsets are 45 wilh reference now to the FIG. 17, an illustration of a call 

not considered further herein. stack ^ de picted. A call stack, such as call stack 1700 is 

Thus, during timer interrupt processing or at obtained by walking the call stack. A call stack is obtained 

postprocessing, the generated sample-based profile informa- each time a periodic event, such as, for example, a timer 

tion reflects a sampling of call stacks, not just leaves of the interrupt occurs. These call stacks may be stored as call stack 
possible call stacks, as in some program counter sampling 50 unwind trace records within the trace file for postprocessing 

techniques. Leaves are nodes in the call stack tree structure, or may be processed on-the-fly while the program continues 

described further below, that are the farthest distance from to execute. 

the root node, also referred to as the primary node. In other \ n me depicted example, call stack 1700 contains a pid 

words, a leaf is a node at the end of a branch (or a node that 1702, which is the process identifier, and a tid 1704, which 
has no descendants). A descendant is a child of a parent 55 ^ the thread identifier. Call stack 1700 also contains 

node, and a leaf is a node that has no children. addresses addrl 1706, addr2 1708 . . . addrN 1710. In this 

With reference now FIG. 16, a diagram depicts the call example, addrl 1706 represents the value of the program 

stack containing stack frames. A "stack" is a region of counter at the time of the interrupt. This address occurs 

reserved memory in which a program or programs store somewhere within the scope of the interrupted function, 
status data, such as procedure and function call addresses, 60 addr2 1708 represents an address within the process that 

passed parameters, and sometimes local variables. A "stack called the function that was interrupted. For Intel-processor- 

frame" is a portion of a thread's stack that represents local based data processing systems, it represents the return 

storage (arguments, return addresses, return values, and address for that call; decrementing that value by 4 results in 

local variables) for a single function invocation. Every the address of the actual call, also known as the call-site, 
active thread of execution has a portion of system memory 65 This corresponds with EIP 16086 in FIG. 16. addrN 1710 is 

allocated for its stack space. A thread's stack consists of the top of the call stack (EIP 1602/?). The call stack that 

sequences of stack frames. The set of frames on a thread's would be returned if the timer interrupt interrupted the 
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thread whose call stack state is depicted in FIG. 16 would many places in the program, it might be useful to know how 

consist of: a pid, which is the process id of the interrupted much of the time spent in routine B was on behalf of (or 

thread; a lid, which is the thread id for the interrupted thread; when called by) routine A and how much of the time was on 

a pcv, which is a program counter value (not shown on FIG. behalf of other routines. The sample-based profiling 

16) for the interrupted thread; EIP \60&b; EIP 16066; HP 5 described herein attempts to provide some information about 

16046; and EIP 16026. In terms of FIG. 17, pcv=addrl, EIP the routines in which a program spends some time when 

16086 =addr2, EIP 16066 oaddr3, EIP 16046 «addr4, EIP event-based trace records do not capture all of the desired 

16026 =addr5. information. 

With reference now to FIG. 18A, a diagram of a program A fundamental concept in the output provided by the 

execution sequence along with the state of the call stack at 10 methods described herein is the call stack. The call stack 

each function entry/exit point is provided. The illustration consists of the routine that is currently running, the routine 

shows entries and exits occurring at regular time intervals, that invoked it, and so on all the way up to main. The 

but this is only a simplification for the illustration. If each sample-based profiler may add one level above that with the 

function (A, B, C, and X in the figure) were instrumented pid/tid (the process Ids and thread Ids). However, an attempt 

with entry/exit event hooks, then complete accounting of the 15 is made to follow the trace event records, such as method 

time spent within and below each function would be readily entries and exits, as shown in FIG. 18A, to reconstruct the 

obtained. Note in FIG. 18 A that at time 0, the executing structure of the call stack frames while the program was 

thread is in routine C. The call stack at time 0 is C. At time executing at various times during the trace. 

1, routine C calls routine A, and the call stack becomes CA The present invention can provide a report consisting of 

and so on. It should be noted that the call stack in FIG. 18A 20 three kinds of time spent in a routine, such as routine A: (1) 

is a reconstructed call stack that is generated by processing base time — the time spent executing code in routine A itself ; 

the event-based trace records in a trace file to follow such (2) cumulative time (shortened to cum time) — the time spent 

events as method entries and method exits. The use of call executing in routine A plus all the time spent executing every 

stack unwind records in conjunction with the use of a routine that routine A calls (and all the routines they call, 

reconstructed call stack from event-based trace records is 25 etc j. ^ ^ waH^io^ tmie or elapsed time. This type of 

described in more detail further below. timing information may be obtained from event-based trace 

The accounting technique and data structure are described records as these records have times tamp information for 

in more detail further below. Unfortunately, this type of each record. 

instrumentation can be expensive, can introduce bias and in ^ A routine's cum time is the sum of all the time spent 

some cases can be hard to apply. Sample-based profiling, by executing the routine plus the time spent executing any other 

sampling the program's call stack, helps to alleviate the routine while that routine is below it on the call stack. In the 

performance bias (and other complications) that entry/exit example above in FIG. 18C, routine A's base time is 2 ms, 

hooks produce. and its cum time is 10 ms. Routine B's base time is 8 ms, and 

Consider FIG. 18B, in which the same program is 35 its cum time is also 8 ms because it does not call any other 

executed, but is being sampled on a regular basis (in the routines. It should be noted that cum time may not be 

example, the interrupt occurs at a frequency equivalent to generated if a call stack tree is being generated on-the-fly — 

two timestamp values). Each sample includes a snapshot of cum time may only be computed after the fact during the 

the interrupted thread's call stack. Not all call stack com- postprocessing phase of a profile utility, 

binations are seen with this technique (note that routine X ^ For wall-clock or elapsed time, if while routine B was 

does not show up at all in the set of call stack samples in running, the system fielded an interrupt or suspended this 

FIG. 18B). This is an acceptable limitation of sampling. The thread to run another thread, or if routine B blocked waiting 

idea is that with an appropriate sampling rate (e.g., 30-1000 on a lock or I/O, then routine B and all the entries above 

times per second), the call stacks in which most of the time routine B on the call stack accumulate elapsed time but not 

is spent will be identified. It does not really matter if some 45 base or cum time. Base and cum time are unaffected by 

call stacks are omitted, provided these call stacks are com- interrupts, dispatching, or blocking. Base time only 

binations for which little time is consumed. increases while a routine is running, and cum time only 

In the event-based traces, there is a fundamental assump- increases while the routine or a routine below it on the call 

tion that the traces contain information about routine entries stack is running. 

and matching routine exits. Often, entry-exit pairs are nested 50 In the example in FIG. 18C, routine A's elapsed time is 

in the traces because routines call other routines. Time spent the same as its cum time — 10 ms. Changing the example 

(or memory consumed) between entry into a routine and exit slightly, suppose there was a 1 ms interrupt in the middle of 

from the same routine is attributed to that routine, but a user B, as shown in FIG. 18 D. Routine A's base and cum time are 

of a profiling tool may want to distinguish between time unchanged at 2 ms and 10 ms, but its elapsed time is now 11 

spent directly in a routine and time spent in other routines 55 ms. 

that it calls. Although base, cum and elapsed in terms of processor 

FIG. 18C shows an example of the manner in which time time spent in routines, sample based profiling is useful for 

may be expended by two routine: a program's main calls attributing consumption of almost any system resource to a 

routine A at time t equal to zero; routine A computes for 1 set of routines, as described in more detail below with 

ms and then calls routine B; routine B computes for 8 ms and 60 respect to FIG. 19B. Referring to FIG. 18C again, if routine 

then returns to routine A; routine A computes for 1 ms and A initiated two disk 1/O's, and that routine B initiated three 

then returns to main. From the point of view of main, routine more I/O's when called by routine A, routine A's "base 

A took 10 ms to execute, but most of that time was spent I/O's" are two and routine A's "cum I/O's" are five, 

executing instructions in routine B and was not spent "Elapsed I/O's" would be all I/O's, including those by other 

executing instructions within routine A. This is a useful 65 threads and processes, that occurred between entry to routine 

piece of information for a person attempting to optimize the A and exit from routine A. More general definitions for the 

example program. In addition, if routine B is called from accounting concepts during profiling would be the follow- 
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ing: base — the amount of the tracked system resource con- lime 4, and is exited at time 7. Thus, the first statistic 

sumed directly by this routine; cum — the amount of the indicates that this particular call stack, CAB, is produced 

tracked system resource consumed by this routine and all twice in the trace. The second statistic indicates that call 

routines below it on the call stack; elapsed — the total stack Cab exists for three units of time (at lime 2, time 4, and 

amount of the tracked system resource consumed (by any S time 6). The third statistic indicates the cumulative amount 

routine) between entry to this routine and exit from the of time spent in call stack CAB and those call stacks invoked 

routine from call stack CAB (I.e. those call slacks having CAB as 

a * j « iti^o it>A ior^ j n. .i_ u a prefix, in this case CABB). The cumulative time in the 

As noted above, FIGS. 18A-18D describe the process by J ^ ^ ^ nG m £ four ^ of ame ^ 

wtuch a reconstructed call stack may be generated by recursion depth of call stack CAB is one, as none of the three 

processing the event-based trace records in a trace file by 10 pKseat ^ ^ ^ ^ ^ ^ KCW ^ vely 

following such events as method entries and method exits. entered 

The use of call stack unwind records from sample-based ~ , ni , - . . . . . 

. ... - « „ Those skilled m the art will appreciate that the tree 

profiling in conjunction with the use of a reconstructed call . « • iad , l • • „ 

r , r* 3 J . , structure depicted in MG. 1VB may be implemented in a 

stack from event-based trace reco^ described in more w ^ of J ^ a varf of ^ of statistics 

S™^^ 'towwrthrespec 1G.^ Hence although 15 m fee maintained al each node Iq ^ described 

FIGS 19A-22 describe call stack trees that may be appk- embodimcnt ^ each node m me tee mntaias data md mt _ 

cable to processing sample-based trace records the descnp- ^ ^ ^ ^ name of ^ ^ ^ 

turn below for generatog or reconstruclmg call stacfa and ^ ^ four statistics above f 

call stack trees in FIGS. 19A-22 is mainly directed to the Qther of slatistical mformation may be stored at each 

processing of event-based trace records. 20 ^ ^ ^ cmbodimcnt> ^ pointcrs fof cacfa 

With reference now to FIG. 19A, a diagram depicts a tree node i nc i u d e a pointer to the node's parent, a pointer to the 

structure generated from trace data. This figure illustrates a first M{d of the node (i c thc lcft . most child ), a pointer to 

call stack tree 1900 in which each node in tree structure 1900 me next sibling of me nod6j ^ a pointer t0 me next 

represents a function entry point. instance of a given routine in the tree. For example, in FIG. 

Additionally, in each node in tree structure 1900, a 19B, node 1954 would contain a parent pointer to node 
number of statistics are recorded. In the depicted example, 1956, a first child pointer to node 1958, a next sibling pointer 
each node, nodes 1902-1908, contains an address (addr), a equa l to NULL (note that node 1954 does not have a next 
base time (BASE), cumulative time (CUM) and parent and sibling), and a next instance pointer to node 1962. Those 
children pointers. As noted above, this type of timing 3o skilled in the art will appreciate that other pointers may be 
information may be obtained from event-based trace records stored to make subsequent analysis more efficient. In 
as these records have timestamp information for each record. addition, other structural elements, such as tables for thc 
The address represents a function entry point. The base time properties of a routine that are invariant across instances 
represents the amount of time consumed directly by this (e.g., the routine's name), may also be stored, 
thread executing this function. The cumulative time is the 35 -j^s type of performance information and statistics main- 
amount of time consumed by this thread executing this ta in e d at eacD no d e are not constrained to time-based 
function and all functions below it on the caU stack. In the performance statistics. The present invention may be used to 
depicted example, pointers are included for each node. One present many types of trace information in a compact 
pointer is a parent pointer, a pointer to the node's parent. manner which supports performance queries. For example, 
Each node also contains a pointer to each child of the node. ^ ralner ^ m keeping statistics regarding time, tracing may be 

Those of ordinary skill in the art will appreciate that tree used to track the number of Java bytecodes executed in each 

structure 1900 may be implemented in a variety of ways and method (i.e. routine) called. The tree structure of the present 

that many different types of statistics may be maintained at invention would then contain statistics regarding bytecodes 

the nodes other than those in the depicted example. executed rather than time. In particular, the quantities 

The call stack is developed from looking back at all return 45 recorded in the second and third categories would reflect the 

addresses. These return addresses will resolve within the number of bytecodes executed rather than the amount of 

bodies of those functions. This information allows for time spent in each method. 

accounting discrimination between distinct invocations of Tracing may also be used to track memory allocation and 

the same function. In other words, if function X has 2 deallocation. Every time a routine creates an object, a trace 

distinct calls to function A, the time associated with those 50 record could be generated. The tree structure of the present 

calls can be accounted for separately. However, most reports invention would then be used to efficiently store and retrieve 

would not make this distinction. information regarding memory allocation. Each node would 

With reference now to FIG. 19B, a call stacktree which represent the number of method calls, the amount of 

reflects call stacks observed during a specific example of memory allocated within a method, the amount of memory 

system execution will now be described. At each node in the 55 allocated by methods called by the method, and the number 

tree, several statistics are recorded. In the example shown in of methods above this instance (i.e. the measure of 

FIG. 19B, the statistics are time-based statistics. The par- recursion). Those skilled in the art will appreciate that the 

ticular statistics shown include the number of distinct times tree structure of the present invention may be used to 

the call stack is produced, the sum of the time spent in the represent a variety of performance data in a manner which 

call stack, the total time spent in the call stack plus the time 60 & compact, and allows a wide variety of performance 

in those call stacks invoked from this call stack (referred to queries to be performed. 

as cumulative time), and the number of instances of this The tree structure shown in FIG. 19B depicts one way in 

routine above this instance (indicating depth of recursion). which data may be pictorially presented to a user. The same 

For example, at node 1952 in FIG. 19B, the call stack is data may also be presented to a user in tabular form as shown 

CAB, and the statistics kept for this node are 2:3:4:1. Note 65 in FIG. 20. 

that call stack CAB is first produced at time 2 in FIG. 18A, With reference now to FIG. 20, a call stack tree presented 

and is exited at time 3. Call stack CAB is produced again at as a table will now be described. Note that FIG. 20 contains 
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a routine, pt pidtid, which is the main process/thread which 

calls routine C. Table 20 includes columns of data for Level 
2030, RL 2032, Calls 2034, Base 2036, Cum 2038, and 
Indent 2040. Level 2030 is the tree level (counting from the 
root as level 0) of the node. RL 2032 is the recursion level. 
Calls 2034 is the number of occurrences of this particular 
call stack, i.e. the Dumber of times this distinct call stack 
configuration occurs. Base 2036 is the total observed time io 
the particular call stack, i.e. the total time that the stack had 
exactly these routines on the stacle Cum 2038 is the total 
time in the particular call stack plus deeper levels below it. 
Indent 2040 depicts the level of the tree in an indented 
manner. From this type of call stack configuration 
information, it is possible to infer each unique call stack 
configuration, how many times the call stack configuration 
occurred, and how long it persisted on the stack. This type 
of information also provides the dynamic structure of a 
program, as it is possible to see which routine called which 
other routine. However, there is no notion of time-order in 
the call stack tree. It cannot be inferred that routines at a 
certain level were called before or after other routines on the 
same level. 

The pictorial view of the call stack tree, as illustrated in 
FIG. 19B, may be built dynamically or built statically using 
a trace text file or binary file as input. FIG. 21 depicts a flow 25 
chart of a method for building a call stack tree using a trace 
text file as input. In FIG. 21, the call stack tree is being built 
to illustrate module entry and exit points. 

With reference now to FIG. 21, it is first determined if 
there are more trace records in the trace text file (step 2150). 30 
If so, several pieces of data are obtained from the trace 
record, including the time, whether the event is an enter or 
a return, and the module name (step 2152). Next, the last 
time increment is attributed to the current node in the tree 
(step 2154). A check is made to determine if the trace record 35 
is an enter or an exit record (step 2156). If it is an exit record, 
the tree is traversed to the parent (using the parent pointer), 
and the current tree node is set equal to the parent node (step 
2158). If the trace record is an enter record, a check is made 
to determine if the module is already a child node of the 40 
current tree node (step 2160). If not, a new node is created 
for the module and it is attached to the tree below the current 
tree node (step 2162). The tree is then traversed to the 
module's node, and the current tree node is set equal to the 
module node (step 2164). The number of calls to the current 45 
tree node is then incremented (step 2166). This process is 
repeated for each trace record in the trace output file, until 
there are no more trace records to parse (step 2168). 

With reference now to FIG. 22, a flow chart depicts a 
method for building a call stack tree dynamically as tracing 
is taking place during system execution. In FIG. 22, as an 
event is logged, it is added to the tree in real time. Preferably, 
a call stack tree is maintained for each thread. The call stack 
tree reflects the call stacks recorded to dale, and a current 
tree node field indicates the current location in a particular 55 
tree. When an event occurs (step 2270), the thread ID is 
obtained (step 2271). The lime, type of event (Le. in this 
case, whether the event is a method entry or exit), the name 
of the module (i.e. method), location of the thread's call 
stack, and location of the thread's "current tree node" are 
then obtained (stem 2272). The last time increment is 
attributed to the current tree node (step 2274). A check is 
made to determine if the trace event is an enter or an exit 
event (step 2276). If it is an exit event, the tree is traversed 



amount of memory dedicated to its maintenance (step 2279). 
Pruning is discussed in more detail below. If the trace event 
is an enter event, a check is made to determine if the module 
is already a child node of the current tree node (step 2280). 
5 If not, a new node is created for the module and it attached 
to the tree below the current tree node (step 2282). The tree 
is then traversed to the module's node, and the current tree 
node is set equal to the module node (step 2284). The 
number of calls to the current tree node is then incremented 
10 (step 2286). Control is then passed back to the executing 
module, and the dynamic tracing/reduction program waits 
for the next event to occur (step 2288). 

One of the advantages of using the dynamic tracing/ 
reduction technique described in FIG. 22 is its enablement 
15 of long-term system trace collection with a finite memory 
buffer. Very detailed performance profiles may be obtained 
without the expense of an "infinite" trace buffer. Coupled 
with dynamic pruning, the method depicted in FIG. 22 can 
support a fixed -buffer-size trace mechanism. 

The use of dynamic tracing and reduction (and dynamic 
pruning in some cases) is especially useful in profiling the 
performance characteristics of long running programs. In the 
case of long running programs, a finite trace buffer can 
severely impact the amount of useful trace information that 
may be collected and analyzed. By using dynamic tracing 
and reduction (and perhaps dynamic pruning), and accurate 
and informative performance profile may be obtained for a 
long running program. 

Dynamic pruning is not required to use the method of the 
present invention. Many long-running applications reach a 
type of steady-state, where every possible routine and call 
stack is present in the tree and updating statistics. Thus, trace 
data can be recorded and stored for such applications 
indefinitely within the constraints of a bounded memory 
requirement. Pruning has value in reducing the memory 
requirement for those situations in which the call stacks are 
actually unbounded. For example, unbounded call stacks arc 
produced by applications that load and run other applica- 
tions. 

Pruning can be performed in many ways, and a variety of 
pruning criteria is possible. For example, pruning decisions 
may be based on the amount of cumulative time attributed 
to a subtree. Note that pruning may be disabled unless the 
amount of memory dedicated to maintaining the call stack 
exceeds some limit. As an exit event is encountered (such as 
step 2278 in FIG. 22), the cumulative time associated wilh 
the current node is compared with the cumulative time 
associated with the parent node. If the ratio of these two 
cumulative times does not exceed a pruning threshold (e.g., 
0.1), then the current node and all of its descendants are 
removed from the tree. The algorithm to build the tree 
proceeds as before by traversing to the parent, and changing 
the current node to the parent. 

Many variations of the above pruning mechanism are 
possible. For example, the pruning threshold can be raised or 
lowered to regulate the level of pruning from very aggres- 
sive to none. More global techniques are also possible, 
including a periodic sweep of the entire call stack tree, 
$0 removing all subtrees whose individual cumulative times are 
not a significant fraction of their parent node's cumulative 
times. 

The performance data reduction of the present invention 



50 



allows analysis programs to easily and quickly answer many 
to the parent (using the parent pointer), and the current tree 65 questions regarding how computing time was spent within 
node is set equal to the parent node (step 2278). At this point, the traced program. This information may be gathered by 
the tree can be dynamically pruned in order to reduce the "walking the tree" and accumulating the data stored at 
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various nodes within the call stack tree, from which it can be 
determined the amount of time spent strictly within routine 
A, the total amount of time spent in routine A and in the 
routines called by routine A cither directly or indirectly, etc. 

With reference now to FIG. 23, a flowchart depicts a 
process for creating a call stack tree structure from call stack 
unwind records (sample-based trace records) in a trace file. 
FIGS. 18A-22 above primarily showed the processes 
involved in generating a call stack tree from event-based 
trace records, which show events such as method entries and 
method exits. These types of trace records allow a call stack 
to be generated, usually during a postprocessing phase of the 
profile tool or utility. Using timer interrupts, a profiling 
function may walk an active call stack to generate a call 
stack unwind trace record. FIG. 23 describes a process for 15 
combining the information in a call stack unwind trace 
record into a call stack tree. The call stack tree may have 
been previously constructed from other call stack unwind 
trace records or from event-based trace records according to 
the methods described in FIGS. 19A-22. 

The process begins by reading a call stack unwind record 
(step 2300). This step processes the call stack information in 
the record to determine what routines are or were executing 
when the timer interrupt occurs or occurred, depending on 
whether the call stack unwind record is being processed 
on-the-fly or is being postprocessed. A sample-based pro- 
filing function avoids, through the call stack unwind, the 
need for adding additional instructions to the programs, 
which affects the performance and time spent in routines. 
Next, the tree structure for this process/thread (pid, lid) is 
located (step 2302). Then, the pointer (FTR) is set to the root 
of this tree structure by setting PTR=root(pid, tid) (step 
2304). The index is set equal to N, which is the number of 
entries in the call stack (step 2306). 

A determination is made as to whether the index is equal 
to zero (step 2308). If the index is equal to zero, the process 
then returns to determine whether additional call stack 
unwind trace records are present for processing (step 2310). 
If additional call stack unwind trace records are present, the 
process then returns to step 2300 to read another call stack 
unwind trace record. Otherwise, the process terminates. 

On the other hand, if the index is not equal to zero, the 
process then sets sample_address equal to the call_stack_ 
address[index] (step 2312). The B-tree is then used to 45 
lookup the address to get a routine name (step 2313). Next, 
a determination is made as to whether FTR. child. name for 
any child of PTR is equal to the looked-up routine name 
(step 2314). In other words, this step determines whether the 
routine name has ever been seen at this level in the tree 5o 
structure. If the address has never been seen at this level in 
the tree structure, a new child of PTR is created and the 
PTR.child.name is set equal to the routine name, the variable 
PTR.child.BASE for the node is set equal to zero, and the 
variable PTR.child.CUM for the node is set equal to zero 55 
(step 2316). Thereafter, the cumulative time for the node is 
incremented by incrementing the variable PTR.child.CUM 
(step 2318). The process also proceeds to step 2318 from 
step 2314 if the address has been seen at this level. In the 
case of sample-based trace records, the "cumulative" time 
represents the number of times that this particular call stack 
configuration has been processed. 

Next, a determination is made as to whether the sample 
address, sample_address, is equal the last address in the call 
stack sample, call_stack_address[ 1 ] (step 2320). If the 
sample address is equal to the address being processed, the 
base time for the node is incremented by incrementing the 
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variable PTR.child.BASE (step 2322). The pointer PTR is 
then set equal to the child (step 2324), and the index is 
decremented (step 2326) with the process then returning to 
step 2308 as previously described. With reference again to 
step 2320, if the sample address is not equal to the address 
being processed, the process then proceeds to step 2324. 

In the depicted example in FIG. 23, the process is used to 
process call stack unwind records recorded during execution 
of a program. The illustrated process also may be imple- 
mented to dynamically process call stack unwind records 
during execution of a program. For example, step 2310 may 
be modified to wait until the next timer interrupt occurs and 
then continue to loop back to step 2310 at the next interrupt. 

The addresses obtained during sampling are used to 
identify functions. The functions are identified by mapping 
these addresses into functions. 

With reference now to FIG. 24, a flowchart depicts a 
process for identifying functions from an address obtained 
during sampling. The process begins by reading a program 
counter value that is obtained during sampling of the call 
stack (step 2400). A determination is made as to whether the 
end of file has been reached (step 2402). If the end of the file 
has not been reached, the program counter value is looked 
up in a global map (step 2404). Aglobal map in the depicted 
example is a map of system and per process symbols that is 
generated from system loader information and application, 
library, and system symbol tables. A process plus function id 
is obtained from the global map in response to looking up 
the program counter value (step 2406). Thereafter, the 
process returns to step 2400. 

The function information may be used in generating 
reports, such as those described below. The process in FIG. 
24 also may be used during execution of a program that is 
sampled. 

With reference now to the FIG. 25, a diagram of a 
structured profile obtained using the processes of the present 
invention is illustrated. Profile 2500 shows sample numbers 
in column 2502. Column 2504 shows the call stack with an 
identification of the functions present within the call stack at 
different sample times. 

With reference now to FIG. 26, a diagram of a record 
generated using the processes of present invention is 
depicted. Each routine in record 2600 is listed separately, 
along with information regarding the routine in FIG. 26. For 
example, Sample column 2602 identifies the sample num- 
ber. Next, Calls column 2604 lists the number of times each 
routine has been called. BASE column 2606 contains the 
total time spent in the routine, while CUM column 2608 
includes the cumulative time spent in the routine and all 
routines called by the routine. CUM2 2610 is the cumulative 
time plus time spent in the recursive routines. Name column 
2612 contains the name of the routine. 

With reference now to FIG. 27, a diagram of another type 
of report that may be produced is depicted. The report 
depicted in FIG. 27 illustrates much of the same information 
found in FIG. 26, but in a slightly different format As with 
FIG. 26, diagram 2700 includes information on calls, base 
time, and cumulative time. 

FIG. 27 shows a sample-based trace output containing 
times spent within various routines as measured in micro- 
seconds. FIG. 27 contains one stanza (delimited by rows of 
equal signs) for each routine that appears in the sample- 
based trace output. The stanza contains information about 
the routine itself on the line labeled "Self*, about who called 
it on lines labeled "Parent", and about who the routine called 
on lines labeled "Child". The stanzas are in order of cum 
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time. The second stanza is about routine A, as indicated by method entries and method, can generate an amount of data 

the line beginning with "Self." The numbers on the "Self' that can be tremendous. Although there are system changes 

line of each stanza show that routine A was called three that can be made to instrument the Java code to supply 

times in this trace, once by routine C and twice by routine entry/exit hooks, there is no simple way to have entry/exit 

B. In the profile terminology, routines C and B are 5 hooks in all of the non-Java code, including C and assembly 
(immediate) parents of routine A Routine A is a child of language code. In fact, providing a private build with this 
routines C and B. All the numbers on the "Parent" rows of support has the disadvantage of adding significant overhead 
the second stanza are breakdowns of routine A's correspond- rclatcd to cpu sim ly to dctcrminc if ^cc data needs 
ing numbers. TTiree microseconds of the seven microsecond to be ^ wMch bccomcs tspccMy prob lematic if the 

r« ^ a ^f ^ imc . s P CD ^ wr * CQ ^ was .1°^ ^ rou [! nc 10 hooks are added indiscriminately. 

C, and three microseconds when it was first called by routine 

B, and another one microsecond when it was called by In ordcr t0 avoid lhc Problems associated with a pure 

routine B for a second time. Likewise, in this example, half stack-walking-based implementation and a pure method/ 

of routine A's fourteen microsecond cum time was spent on trace-based implementation and to obtain the advantages of 

behalf of each parent both event-based profiling and sample-based profiling, the 

Routine C called routine B and routine A once each. All 15 P resen . t mv ^ ti ? a P rovWes m of both modes of 

the numbers on "Child" rows are subsets of numbers from oP^on. ™* integration can provide for selectively instru- 

, c- , , t . merited stack walks and for tuner-based stack walks. The 

the child s profile. For example, of the three calls to routine . , , , 

A in this trace, one was by routine C; of routine A's seven f lec ^ ^tnimented 1 stack walks are espec.aUy useful 

microsecond total base time, three microseconds were while , n for debugging and path analysis^ The approach m thus 

it was called directly by routine C; of routine A's fourteen mv f nUon P™ 1 ** «»r greater flexibility m instrumentation 

microsecond cum time, seven microseconds was on behalf ^ Dep«dmg on the coverage available in 

of routine C. Notice that these same numbers are the first ^T P ' , I f C0Dlex * al ? fonnaU ° n !° 

row of the second stanza, where routine C is listed as one of hel P Its ? lve *• sample data This is particularly useful in 

routine A's parents attempting to build a bridge of understanding between Java 

„, ,.',., ,. 25 code and native code. For example, an effective debugging 

The four relationships that are true of each stanza arc ... . _ , . . _f • ' . . .. °?7 ° 

. . . r „ „_ . ... aid is using method entry and exits in conjunction with a 

summarized at the lop of FIG. 27. First, the sum of the , . „ r , . . . . , /. ,, , 

. , „ „ r , , « . . slack walking book placed inside an assembler routine that 

numbers in the Calls column for parents equals the number fee fa ^ ^ w ef 

of calls on the self row. Second, the sum of the numbers m J J & r 

the Base column for parents equals Self s base. Third, the 30 With reference now to FIG. 28, a flowchart depicts the 

sum of the numbers in the Cum column for parents equals processing of a trace file that contains both event-based and 

Self s Cum. These first three invariants are true because sample-based profiling information. The process shown in 

these characteristics are the definition of Parent; collectively HG. 28 is similar to the overall trace processing shown in 

they are supposed to account for all of Selfs activities. FIGS - S" 6 exce P l * c trace file m FIG - 28 has been 

Fourth, the Cum in the Child rows accounts for all of Selfs 35 extended to include sample-based profiling information. 

Cum except for its own Base. The process begins by opening the trace file that contains 

Program sampling contains information from the call event-based profiling information merged with sample- 
stack and provides a profile, reflecting the sampling of an bascd profiling information (step 2802). A profile record is 
entire call stack, not just the leaves. Furthermore, the read from the file (step 2804), and a determination is made 
sample-based profiling technique may also be applied to ^ as 10 whether P rofile rec °rd is an event-based record, 
other types of stacks. For example, with Java programs, a such as method entry/exit trace records (step 2806). If so, 
large amount of time is spent in a subroutine called the ^en the event-based record is processed (step 2808), and a 
"interpreter". If only the call stack was examined, the profile determination is made as to whether they are more records 
would not reveal much useful information. Since the inter- 1° be processed in the trace file (step 2816). If so, then the 
preter also tracks information in its own stack, e.g., a Java 45 process loops back to step 2804. If there are no more 
stack (with its own linkage conventions), the process can be records, then a report is generated containing the processed 
used to walk up me Java stack to obtain the calling sequence event-based information and the processed sample-based 
from the perspective of the interpreted Java program. information in the same report (step 2818), and the process 

As noted previously, both sample-based profiling and terminates, 
event-based profiling have problems. The primary problem 50 Referring back to step 2806, if the record that was read 
related to a sample-based profiling approach, such as using from the trace file was not an event-based record, then a 
calling sequence stacks, is that it is not always possible to determination is made as to whether the record is a sample- 
walk calling sequence stacks due to a variety of reasons. based record, such as a stack-walking or call stack unwind 
First, any routines that have been coded with assembly record (step 2810). If so, then the sample -based record is 
language code may not follow the normal call/return con- 55 processed (step 2812), and the process continues by check- 
ventions. Second, the system may support noncontiguous ing for remaining records in step 2816. If the next record was 
stacks. Third, existing, legacy code may have 16 bit glue not a sample-based record, then an error message is gener- 
code versus more current code with 32 bit glue code. Fourth, ated for the unrecognized record as the trace file is assumed 
the context may change from interpreted Java to C, or from to contain one of the two type of profiling records, either 
C to Java, with a fairly complex set of return algorithms. 60 event-based profiling records or sample-based profiling 
This is complicated by calls from Java to native code with records. 

a variety of methodologies for calling native code. Fifth, With reference now to FIG. 29, a figure depicts a report 

walking the stack may be difficult due to being on the timer generated from a trace file containing both event-based 

tick and being unable to access real memory, i.e. being profiling information (method entry/exits) and sample-based 

paged out. 65 profiling information (stack unwind). FIG. 29 is similar to 

The primary problem related to an event-based profiling FIG. 20, in which a call stack tree is presented as a report, 

approach is that providing event trace records, such as on except that FIG. 29 contains embedded stack walking infor- 
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mation. Call stack iree 2900 contains two stack unwinds lected major code and minor code that have been used to 

generated within the time period represented by the total of instrument the software. 

342 ticks. Stack unwind identifier 2902 denotes the begin- The process begins when an application is initiated with 
ning of stack unwind information 2906, with the names of an option to execute the application with automatic, real- 
routines that are indented to the right containing the slack 5 time instrumentation as specified by a major code and minor 
information that the stack walking process was able to code (step 3102). Using the map of a dynamic load library 
discern. Stack unwind identifier 2904 denotes the beginning (DLL), one can determine the start of a routine to be 
of stack unwind information 2908. In this example, "I:" instrumented. A utility takes, as input, the memory map, the 
identifies an interpreted method and "F:" identifies a native namc 0 f the routine to be patched or updated, and the 
function, such as a native function within JavaOS. A call 10 profiling function that is desired. The utility either patches 
from a Java method to a native method is via "ExecuteJava" the file corresponding to the map name on disk or its loaded 
Hence, at the point at which the stack walking process version. 

reaches a stack frame for an "ExecuteJava," it cannot ^ instruction at a memory address AddressLabelA, 

proceed any further up the stack as the stack frames are ^^jy mc start of ^ ^ updatcd ^th ^ mt3 

discontinued. The process for creating a tree containing both 15 interrupt, and the original instruction is saved in a lookup 

event-based nodes and sample-based nodes is described in table (stcp 3104) ^ utility rcme mbers the byte that is 

more detail further below In this case, identifiers 2902 and replaced with the int3 interrupt and its location within the 

2904 also denote the major code associated with the stack file or ^thin memory. The utility then takes over the 

unwind. Major codes are described in further detail below. software interrupt vector and allows execution to continue. 

With reference now to FIG. 30, a table depicts major 20 For example, if the application is being loaded, the main 

codes and minor codes that may be employed to instrument routine of the application may be invoked (step 3106). 

software modules for profiling. In order to facilitate the The method in the application that contains the updated 

merging of event-based profiling information and sample- instruction is eventually invoked (step 3108), and the int3 

based profiling information, a set of codes may be used to interrupt is asserted (step 3110). The int3 interrupt handler 

turn on and off various types of profiling functions. 25 ^ determines that the address AddressLabelA has caused 

For example, as shown in FIG. 30, the minor code for a the interrupt (step 3112) by using the program counter. The 

stack unwind is designated as 0x7ffffrTf, which may be used int3 interrupt handler determines that the interrupt was 

for two different purposes. The first purpose, denoted with a caused by a real-time insertion of an interrupt and performs 

major code of 0x40, Is for a stack unwind during a timer an internal table lookup to determine the type of profiling to 

interrupt. When this information is output into a trace file, 30 be performed for the currently interrupted routine (step 

the stack information that appears within the file will have 3114). Profiling actions are then performed (step 3116). For 

been coded so that the stack information is analyzed as example, if the table indicated a stack unwind as the desired 

sample -based profiling information. The second purpose, type of profiling, then the stack walking process is invoked, 

denoted with a major code of 0x41, is for a stack unwind in which will identify the interrupted routine as the first entry 

an instrumented routine. This stack information could then 35 in the stack. 

be post-processed as event-based profiling information. The inG interrupt handler then updates the method code 

Other examples in the table show a profile or major code with the original code (step 3118) and issues a single step 

purpose of tracing jitted methods with a major code value of interrupt to execute the original or replaced code (step 

0x50. Tracing of jitted methods may be distinguished based 3120). Similar to the fielding of the int3 interrupt, a utility 

on the minor code that indicates method invocation or may take over the single step interrupt vector and field the 

method exit. In contrast, a major code of 0x30 indicates a interrupt. At the point that the single step interrupt is fielded, 

profiling purpose of instrumenting interpreted methods, the routine has been executed in single step mode. The 

while the minor code again indicates, with the same values, interrupt handler then updates the instruction at address 

method invocation or method exit. 45 AddressLabelA by inserting the int3 interrupt again (step 

Referring back to FIG. 29, the connection can be made 3122), and the interrupted method is allowed to continue 

between the use of major and minor codes, the instrumen- normally (step 3124). 

tation of code, and the post -processing of profile informa- With reference now to FIGS. 31B-C, examples of 

tion. In the generated report shown in FIG. 29, the stack pseudo-assembly language code depict the changes required 

unwind identifiers can be seen to be equal to 0x40, which 50 for inserting profile books into specific routines in real-time 

according to the table in FIG. 30, is a stack unwind gener- by updating the code for a software interrupt. FIG. 31B 

ated in response to a timer interrupt. This type of stack shows a set of generic instructions before alteration. As 

unwind may have occurred in response to a regular interrupt noted previously, these types of hooks are generally placed 

that was created in order to generate a sampled profile of the at the beginning of a routine, but for illustration in FIG. 31 C, 

executing software. 55 an int3 interrupt is shown being embedded within a routine. 

As noted in the last column of the table in FIG. 30, by With this methodology, the stack unwind may occur at 

using a utility that places a hook into a software module to selected points in the code without actually modifying the 

be profiled, a stack unwind may be instrumented into a source code. The same approach may be used with jitted 

routine. If so, the output for this type of stack unwind will code if the utility has access to all of the hooks which 

be designated with a major code of 0x41. <s 0 identify the placement of the jitted code. 

With reference now to FIG. 31A, a flowchart depicts a FIGS. 31A-C describes a manner in which a module may 

process for inserting profile hooks into specific routines in be instrumented in real-time with profiling hooks, and these 

real-time by updating the code for a software interrupt. hooks may use the major code and minor code distinctions 

However, when the interrupt is fielded, it may be used to as explained with respect to FIG. 30. An example of a report 

generate a variety of profiling information, such as stack 65 that shows calling structure between routines is shown in 

unwind information. The type of profiling information that FIG. 29, which also showed the use of a major code in 

is to be generated may be determined based upon a prese- distinguishing some of the trace information. However, the 
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manner in which the report was generated, i.e. the manner 
which the event-based profiling information and sample - 
based profiling information may be merged into a single data 
structure, is described with respect to FIGS. 32A-33. 

With respect to FIGS. 32A-32C, a series of tree structures 
generated from events and stack unwinds is depicted. 

A stack unwind may assume that the type of routines that 
will appear within a stack adhere to a common type of 
routine-calling methodology. The types of routines that one 
would expect to find in a stack unwind include jitted 
methods or native methods, which would typically appear in 
the event trace output via the entry/exit tracing. However, 
there may be some routines in a list of routines gathered 
from a stack unwind that do not appear in a list of routines 
gathered from an event trace. With the exception of inter- 
preted code, one would not expect to have any routines in an 
event-based tree structure gathered from trace records that 
are not found in the call stack tree structure. One might only 
expect a one-to-one correspondence between the stack 
unwinds and method entry/exits if every method were 
instrumented with an entry and exit hook. Hence, the 
problem of matching routines from event records and rou- 
tines from stack walking records is reduced to the problem 
of merging a stack unwind that contains every routine that 
might appear in the method entries/exits. 

FIGS. 32A-32B provide an example of the discrepancies 
that may be found between a stack unwind or entry/exit trace 
records. FIG. 32A contains each routine that may be found 
in FIG. 32B. According to tree 3200, routine Eventl has 
called routine Event2. According to tree 3210, routine 
Eventl has called routine SampleA, which has called routine 
SampleB, which has called routine Event2, which has called 
routine SampleC, which called SampleD, which called Sam- 
pleE. These trees are similar to the type of trees shown in 
FIGS. 19A-19B and built according to methods described in 
FIGS. 21-23. In order to provide an integrated report of 
event and sampled data that shows calling structure, as 
shown in FIG. 29, the two sets of information shown in FIG. 
32A and FIG. 32B must be merged. 

Rather than inserting the sampled nodes between two 
event nodes, a sampled sequence is chained or connected to 
an event node. FIG. 32C shows tree 3220 in which the 
sampled nodes SampleC, SampleD, and SampleE have been 
connected to event node Event2 and in which the sampled 
nodes SampleA and SampleB have been connected to event 
node Eventl. In this manner, the original calling sequence 
configuration according to the event trace records is pre- 
served while the sample -de rived calling sequence is 
appended to provide more information about the sequence of 
calls than could be provided by the event-based tree alone. 
The process of building the merged tree is described further 
below. 

Referring back to FIG. 19A, the data structure for each 
node in the tree is shown. In order to support the merged tree 
shown in FIG. 32C, the data structure for each node may be 
updated so that it includes a flag or other type of indicator 
identifying the node as a sampled node and providing a 
variable for the storage of the number of sampled 
occurrences, i.e. the number of limes that the call stack 
configuration from a stack unwind within the trace output 
matched the configuration as represented by a particular 
node in the tree. 

With reference now to FIG. 33, a flowchart depicts a 
method by which a tree containing merged sample data and 
event data is constructed. The algorithms discussed above 
for creating event-based tree remains unchanged but sample- 
based processing proceeds as shown in FIG. 33. 
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The process begins by converting the sampled call stack 
into an sampled call stack array of routine names (step 3302) 
in which the first entry is the interrupted routine and the last 
entry is the initial routine or as close to the "root caller" as 
is available. The conversion of memory address into routine 
names is described above. An entry from the array of names 
is retrieved (step 3304), and a search is made for the name 
occurs in the current event stack tree (step 3306). Once the 
name occurs in the event stack tree, a set of "sampled nodes" 
consisting of a subtree is connected to the event -based tree 
(step 3308), and the new subtree represents the entries in the 
sampled call stack array up to the presently matching node. 
A determination is then made as to whether there are other 
entries in the sampled call stack array that have yet to be 
processed (step 3310). If so, then the process loops back to 
process another entry for a match in the event tree. If not, 
then the process completes. When a sampled call stack array 
entry is connected or chained to a node in an event tree, the 
sampled node flags and other associated information in the 
node, as described above, are updated. 

It is important to note that while the present invention has 
been described in the context of a fully functioning data 
processing system, those of ordinary skill in the art will 
appreciate that the processes of the present invention are 
capable of being distributed in a form of a computer readable 
25 medium of instructions and a variety of forms and that the 
present invention applies equally regardless of the particular 
type of signal bearing media actually used to carry out the 
distribution. Examples of computer readable media include 
recordable-type media such a floppy disc, a hard disk drive, 
a RAM, and CD-ROMs and transmission-type media such 
as digital and analog communications links. 

The description of the present invention has been pre- 
sented for purposes of illustration and description, but is not 
limited to be exhaustive or limited to the invention in the 
form disclosed. Many modifications and variations will be 
apparent to those of ordinary skill in the art. For example, 
the present invention may be applied to other interpreted 
programming systems and environments other than Java. 
The embodiment was chosen and described in order to best 
explain the principles of the invention the practical appli- 
cation and to enable others of ordinary skill in the art to 
understand the invention for various embodiments with 
various modifications as are suited to the particular use 
contemplated. 

It is important to note that while the present invention has 
been described in the context of a single JVM active in an 
operating system, there is no constraint to its application to 
multiple JVMs. This generalization is well within the means 
of those with ordinary skill in the art. 
What is claimed is: 

1. A process in a data processing system for profiling an 
instrumented program executing in the data processing 
system, the process comprising: 

recording trace data in response to an occurrence of a 

selected event within the instrumented program; 
detecting an occurrence of a selected interrupt; 
identifying a call stack associated with the instrumented 
program in response to the detection of the selected 
interrupt; 

examining the call stack to identify each routine that is 
currently executing in association with the instru- 
mented program; and 
recording additional trace data which includes an indica- 
tion of each currently executing routine. 

2. The process of claim 1 further comprising: processing 
the trace data to identify a thread and method executed for 
each indication. 
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3. The process of claim 2 wherein the step of processing 
the trace data comprises identifying trace data in real-time as 
it is recorded during execution of the instrumented program. 

4. The process of claim 2 wherein the step of processing 
the trace data comprises identifying trace data obtained from 
a trace file containing one or more trace events stored in the 
trace file. 

5. The process of claim 1 further comprising: 
generating trace data based on a major code and a minor 

code, wherein the major code and the minor code 
provide selection of profiling functions. 

6. The process of claim 5 wherein the major code and the 
minor code select a call stack unwind profiling function to 
be executed in response to the occurrence of the selected 
interrupt. 

7. The process of claim 5 wherein the major code and the 
minor code select a call stack unwind profiling function to 
be executed in response to the occurrence of the selected 
event within the instrumented program. 

8. The process of claim 1 further comprising: 
instrumenting the program with a real-time insertion of an 

interrupt instruction into the instrumented program for 
generating an interrupt 

9. The process of claim 8 further comprising: 
reading an original instruction at an address in the pro- 
gram; 

storing the original instruction in a lookup table with the 
address and an associated profiling function; 

replacing the original instruction with the interrupt 
instruction; 

fielding an interrupt generated by the interrupt instruction 

at the address; 
searching the lookup table using the address; and 
performing the associated profiling function. 

10. The process of claim 9 further comprising: 
restoring the original instruction at the address; 
issuing a single step interrupt to execute the original 

instruction; and 
restoring the interrupt instruction at the address. 

11. An apparatus for profiling an instrumented program 
executing in a data processing system, the apparatus com- 
prising: 

first recording means for recording trace data in response 
to an occurrence of a selected event within the instru- 
mented program; 

detecting means for detecting an occurrence of a selected 
interrupt; 

identifying means for identifying a call stack associated 

with the instrumented program in response to the 

detection of the selected interrupt; 
examining means for examining the call stack to identify 

each routine that is currently executing in association 

with the instrumented program; and 
second recording means for recording additional trace 

data which includes an indication of each currently 

executing routine. 

12. The apparatus of claim 11 further comprising: 
processing means for processing the trace data to identify 

a thread and method executed for each indication. 

13. The apparatus of claim 12 wherein the processing 
means comprises identifying means for identifying trace 
data in real-time as it is recorded during execution of the 
instrumented program. 

14. The apparatus of claim 12 wherein the processing 
means comprises identifying means for identifying trace 
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data obtained from a trace file containing one or more trace 
events stored in the trace file. 

15. The apparatus of claim U further comprising: 
generating means for generating trace data based on a 

major code and a minor code, wherein the major code 
and the minor code provide selection of profiling 
functions. 

16. The apparatus of claim 15 wherein the major code and 
the minor code select a call stack unwind profiling function 
to be executed in response to the occurrence of the selected 
interrupt. 

17. The apparatus of claim 15 wherein the major code and 
the minor code select a call stack unwind profiling function 
to be executed in response to the occurrence of the selected 
event within the instrumented program. 

18. The apparatus of claim it further comprising: 
instrumenting means for instrumenting the program with 

a real-time insertion of an interrupt instruction into the 
program for generating an interrupt. 

19. The apparatus of claim 18 further comprising: 
reading means for reading an original instruction at an 

address in the program; 
storing means for storing the original instruction in a 
lookup table with the address and an associated profil- 
ing function; 

replacing means for replacing the original instruction with 

the interrupt instruction; 
fielding means for fielding an interrupt generated by the 

interrupt instruction at the address; 
searching means for searching the lookup table using the 

address; and 

performing means for performing the associated profiling 
function. 

20. The apparatus of claim 19 further comprising: 

first restoring means for restoring the original instruction 
at the address; 

issuing means for issuing a single step interrupt to execute 
the original instruction; and 

second restoring means for restoring the interrupt instruc- 
tion at the address. 

21. A computer program product on a computer readable 
medium for use in a data processing system for profiling an 
executing program, the computer program product compris- 
ing: 

first instructions for recording trace data in response to an 
occurrence of a selected event within the instrumented 
program; 

second instructions for detecting an occurrence of a 

selected interrupt; 
third instructions for identifying a call stack associated 

with the instrumented program in response to the 

detection of the selected interrupt; 
fourth instructions for examining the call stack to identify 

each routine that is currently executing in association 

with the instrumented program; and 
fifth instructions for recording additional trace data which 

includes an indication of each currently executing 

routine. 

22. The computer program product of claim 21 further 
comprising: 

instructions for processing the trace data to identify a 
thread and method executed for each indication. 

23. The computer program product of claim 22 wherein 
the instructions for processing comprises instructions for 
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identifying trace data in real-time as it is recorded during 
execution of the instrumented program. 

24. The computer program product of claim 22 wherein 
the instructions for processing comprises instructions for 
identifying trace data obtained from a trace file containing s 
one or more trace events stored in the trace file. 

25. The computer program product of claim 21 further 
comprising: 

instructions for generating trace data based on a major 
code and a minor code, wherein the major code and the 10 
minor code provide selection of profiling functions. 

26. The computer program product of claim 25 wherein 
the major code and the minor code select a call stack unwind 
profiling function to be executed in response to the occur- 
rence of the selected interrupt. 15 

27. The computer program product of claim 25 wherein 
the major code and the minor code select a call stack unwind 
profiling function to be executed in response to the occur- 
rence of the selected event within the instrumented program. 

28. The computer program product of claim 21 further 20 
comprising: 

instructions for instrumenting the program with a real- 
time insertion of an interrupt instruction into the instru- 
mented program for generating an interrupt. 

29. The computer program product of claim 28 further 
comprising: 

instructions for reading an original instruction at an 

address in the program; 
instructions for storing the original instruction in a lookup 30 

table with the address and an associated profiling 

function; 

instructions for replacing the original instruction with the 
interrupt instruction; 
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instructions for fielding an interrupt generated by the 

interrupt instruction at the address; 
instructions for searching the lookup table using the 

address; and 

instructions for performing the associated profiling func- 
tion. 

30. The computer program product of claim 29 further 
comprising: 

instructions for restoring the original instruction at the 
address; 

instructions for issuing a single step interrupt to execute 

the original instruction; and 
instructions for restoring the interrupt instruction at the 

address. 

31. A data processing system comprising: 
a bus system; 

a communications unit connected to the bus system;. 

a memory connected to the bus system, wherein the 
memory includes a set of instructions; and 

a processing unit connected to the bus system, wherein the 
processing unit executes the set of instructions to 
record trace data in response to an occurrence of a 
selected event within an instrumented program; detect 
an occurrence of a selected interrupt; identify a call 
stack associated with the program in response to the 
detection of the selected interrupt; examine the call 
stack to identify each routine that is currently executing 
in association with the program; and record additional 
trace data which includes an indication of each cur- 
rently executing routine. 
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